NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fast Multi-Modal Multi-Instance Support Vector Machine for Fine-grained Chest X-ray Recognition

https://doi.org/10.1109/ICDM58522.2023.00164

Seo, Hoon; Wang, Hua (December 2023, IEEE ICDM 2023)

Chest X-ray (CXR) analysis plays an important role in patient treatment. As such, a multitude of machine learning models have been applied to CXR datasets attempting automated analysis. However, each patient has a differing number of images per angle, and multi-modal learning should deal with the missing data for specific angles and times. Furthermore, the large dimensionality of multi-modal imaging data with the shapes inconsistent across the dataset introduces the challenges in training. In light of these issues, we propose the Fast Multi-Modal Support Vector Machine (FMMSVM) which incorporates modality-specific factorization to deal with missing CXRs in the specific angle. Our model is able to adjust the fine-grained details in feature extraction and we provide an efficient optimization algorithm scalable to a large number of features. In our experiments, FMMSVM shows clearly improved classification performance.
more » « less
Full Text Available
A linear primal–dual multi-instance SVM for big data classifications

https://doi.org/10.1007/s10115-023-01961-z

Brand, Lodewijk; Seo, Hoon; Baker, Lauren Zoe; Ellefsen, Carla; Sargent, Jackson; Wang, Hua (August 2023, Knowledge and Information Systems)

Multi-instance learning (MIL) handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting for classifying bags which contain any number of instances. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper, we present a novel primal–dual multi-instance support vector machine that can operate efficiently on large-scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers. The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are broadly used to optimize MIL algorithms based on SVMs. In addition, we improve our derivation to include an additional optimization designed to avoid solving a least-squares problem in our algorithm, which increases the utility of our approach to handle a large number of features as well as bags. Finally, we derive a kernel extension of our approach to learn nonlinear decision boundaries for enhanced classification capabilities. We apply our approach to both synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method.
more » « less
Full Text Available
Scalable Multi-Instance Multi-Shape Support Vector Machine for Whole Slide Breast Histopathology

https://doi.org/10.1109/ICKG55886.2022.00036

Seo, Hoon; Brand, Lodewijk; Barco, Lucia Saldana; Wang, Hua (November 2022, 2022 IEEE International Conference on Knowledge Graph (ICKG))

Histopathological image analysis is critical in cancer diagnosis and treatment. Due to the huge size of histopathological images, most existing works analyze the whole slide pathological image (WSI) as a bag and its patches are considered as instances. However, these approaches are limited to analyzing the patches in a fixed shape, while the malignant lesions can form varied shapes. To address this challenge, we propose the Multi-Instance Multi-Shape Support Vector Machine (MIMSSVM) to analyze the multiple images (instances) jointly where each instance consists of multiple patches in varied shapes. In our approach, we can identify the varied morphologic abnormalities of nuclei shapes from the multiple images. In addition to the multi-instance multi-shape learning capability, we provide an efficient algorithm to optimize the proposed model which scales well to a large number of features. Our experimental results show the proposed MIMSSVM method outperforms the existing SVM and recent deep learning models in histopathological classification. The proposed model also identifies the tissue segments in an image exhibiting an indication of an abnormality which provides utility in the early detection of malignant tumors.
more » « less
Full Text Available
Scaling multi-instance support vector machine to breast cancer detection on the BreaKHis dataset

https://doi.org/10.1093/bioinformatics/btac267

Seo, Hoon; Brand, Lodewijk; Barco, Lucia Saldana; Wang, Hua (June 2022, Bioinformatics)

Abstract MotivationBreast cancer is a type of cancer that develops in breast tissues, and, after skin cancer, it is the most commonly diagnosed cancer in women in the United States. Given that an early diagnosis is imperative to prevent breast cancer progression, many machine learning models have been developed in recent years to automate the histopathological classification of the different types of carcinomas. However, many of them are not scalable to large-scale datasets. ResultsIn this study, we propose the novel Primal-Dual Multi-Instance Support Vector Machine to determine which tissue segments in an image exhibit an indication of an abnormality. We derive an efficient optimization algorithm for the proposed objective by bypassing the quadratic programming and least-squares problems, which are commonly employed to optimize Support Vector Machine models. The proposed method is computationally efficient, thereby it is scalable to large-scale datasets. We applied our method to the public BreaKHis dataset and achieved promising prediction performance and scalability for histopathological classification. Availability and implementationSoftware is publicly available at: https://1drv.ms/u/s!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
Learning Deeply Enriched Representations of Longitudinal Imaging-Genetic Data to Predict Alzheimer’s Disease Progression

https://doi.org/10.1109/BIBM52615.2021.9669428

Seo, Hoon; Wang, Hua (December 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Alzheimer’s Disease (AD) is a progressive memory disorder that causes irreversible cognitive declines, therefore early diagnosis is imperative to prevent the progression of AD. To this end, many biomarker analysis models have been presented for early AD detection. However, these models may not realize the full data potential due to their failure to integrate longitudinal (dynamic) phenotypic data with (static) genetic data. Sometimes, they may not fully utilize both labeled and unlabeled samples either. To overcome these limitations, we propose a semi-supervised enrichment learning method to learn a fixed-length vectorial representation for each participant, by which the static data record can be integrated with the dynamic data records. We have applied our new method on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort and achieved 75% accuracy on multiclass AD progression prediction by one year in advance.
more » « less
Full Text Available
Learning Semi-Supervised Representation Enrichment Using Longitudinal Imaging-Genetic Data

https://doi.org/10.1109/BIBM49941.2020.9313310

Seo, Hoon; Brand, Lodewijk; Wang, Hua (December 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))
null (Ed.)
Alzheimer's Disease (AD) is a progressive memory disorder that causes irreversible cognitive decline. Recently, many statistical learning methods have been presented to predict cognitive declines by using longitudinal imaging data. However, missing records that broadly exist in the longitudinal neuroimaging data have posed a critical challenge for effectively using these data in machine learning models. To tackle this difficulty, in this paper we propose a novel approach to integrate longitudinal (dynamic) phenotypic data and static genetic data to learn a fixed-length biomarker representation using the enrichment learned from the temporal data in multiple imaging modalities. Armed with this enriched biomarker representation, as a fixed-length vector per participant, conventional machine learning models can be used to predict clinical outcomes associated with AD. We have applied our new method on the Alzheimer's Disease Neruoimaging Initiative (ADNI) cohort and achieved promising experimental results that validate its effectiveness.
more » « less
Full Text Available
Integrating Static and Dynamic Data for Improved Prediction of Cognitive Declines Using Augmented Genotype-Phenotype Representations

Seo, Hoon; Brand, Lodewijk; Wang, Hua; Nie, Feiping (January 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that causes severe problems in patients’ thinking, memory, and behavior. An early diagnosis is crucial to prevent AD progression; to this end, many algorithmic approaches have recently been proposed to predict cognitive decline. However, these predictive models often fail to integrate heterogeneous genetic and neuroimaging biomarkers and struggle to handle missing data. In this work we propose a novel objective function and an associated optimization algorithm to identify cognitive decline related to AD. Our approach is designed to incorporate dynamic neuroimaging data by way of a participant-specific augmentation combined with multimodal data integration aligned via a regression task. Our approach, in order to incorporate additional side-information, utilizes structured regularization techniques popularized in recent AD literature. Armed with the fixed-length vector representation learned from the multimodal dynamic and static modalities, conventional machine learning methods can be used to predict the clinical outcomes associated with AD. Our experimental results show that the proposed augmentation model improves the prediction performance on cognitive assessment scores for a collection of popular machine learning algorithms. The results of our approach are interpreted to validate existing genetic and neuroimaging biomarkers that have been shown to be predictive of cognitive decline.
more » « less
Full Text Available

Search for: All records